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Evidence for Substructure in Ursa Minor Dwarf Spheroidal Galaxy 
using a Bayesian Object Detection Method 
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ABSTRACT 

We present a method for identifying localized secondary populations in stellar velocity data 
using Bayesian statistical techniques. We apply this method to the dwarf spheroidal galaxy 
Ursa Minor and find two secondary objects in this satellite of the Milky Way. One object is 
kinematically cold with a velocity dispersion of 4.25 + 0.75 km s _1 and centered at (9.1' + 
1.5, 7.2' + 1.2) in relative RA and DEC with respect to the center of Ursa Minor. The second 
object has a large velocity offset of -12.8*H km s compared to Ursa Minor and centered 
at (— 14.0'^5 g, -2.5'*i ). The kinematically cold object has been found before using a smaller 
data set but the prediction that this cold object has a velocity dispersion larger than 2.0 km s 
at 95% C.L. differs from previous work. We use two and three component models along with 
the information criteria and Bayesian evidence model selection methods to argue that Ursa 
Minor has one or two localized secondary populations. The significant probability for a large 
velocity dispersion in each secondary object raises the intriguing possibility that each has its 
own dark matter halo, that is, it is a satellite of a satellite of the Milky Way. 
keywords: Dark Matter: Substructure, Dwarf Galaxies: Ursa Minor, Bayesian Statistics 



1 INTRODUCTION 



The Milky Way dwarf spheroidal galaxies (dSphs) are the faintest 
but most numerous of the Galactic satellites. About 22 dSphs have 
been discovered with nine known before the Sloan Digital Sky Sur- 
vey (SDSS). The latter satellites are often collectively referred to 
as the classical dSphs. Thus, thanks to the advent of the SDSS, 
the number of known Milky Way dSphs has more than doubl ed 



I Willman et alj2005l:lBel"okurov et alj200d:IZucker et alj|20061 



Belokur ov et al 
20071: 



Walsh et all 



20071: ISakamoto & Hasegawal 120061 ; llrwin et al.1 
' l2007ir he classical systems are in general 



brighter and more extended than their post-SDSS counterparts, usu- 
ally referred to as the ultra-faint dwarfs. The dSph population of the 
Milky Way have a wide range of lu minosities, 10 3 7 L S , and sizes 
(half- l ight radii) from 40 to 1000 pc dMateolll998l : ISimon & Getial 
120071 : lMartinetalJ |2008), but span a narrow range of dynami- 
cal mass: M(r < 300pc) « 10 7 M Q for most of the dwarfs 
( Strig arTet al.l2008h . In the context of hierarchical structure forma- 
tion scenario, these dSphs would reside in the dark matter subhalos 
of the Milky Way host halo and so the dynamical mass provides an 
estimate of the amount of dark matter in subhalos. The dynamical 
mass-to-light ratios span a large range of 8-4000 (in solar units); 
some of these systems are th e most dark matter dominated systems 
known d Walker et al. 1 1200931 ; IWolf et alj[201ol ; ISimon etalJboill ; 
iMartinez et alj201lh . 

Simulations also predict tha t subhalos should have their 
own subhalos ("sub-subha l os". e.g..|Shaw et alj2007l ; iKuhlen et al.l 
l2008l ; ISpringel et al.ll2008l:lDiemand et alj|2008l> . While their pres- 



ence in cold dark matter simulations has been verified, the mass 
function of these sub-subhalos hasn't been well-quantified. The 
subhalo mass function is seen to follow a universal profile when 
scaled to the virial mass of the host halo. If the sub-subhalos fol- 
low the same pattern, the n we expect to see a sub-subhalo with 
V max - 0.3V raax (subhalo) (Spri ngel et~aT]|2008h . We are motivated 
by this fact to search for stellar content that could be associated 
with these sub-subhalos. 

Several dSphs show signs of stellar substructure or multiple 
distinct chemo-kinematic populations (Fornax, Sculptor, Sextans, 
Ursa Minor, Canes Venatici I). For instance, in Fornax, there are 
stellar over-d ensities along the minor ax is, possibly remnants of 
past mergers dColeman et al. I l2004l . l2005h and five globular clus- 
ters ( Mac kev & Gilmord 120031) . In addition, Fornax's metal-rich 
and metal-p oor stellar componen ts seem to have different velocity 
dispersions (Battaglia et al. 2006). Similarly, Sextans and Sculptor 
each contain two kine matically distinct secondary populations wit h 
different metallicities (Bell azzini et alj200ll ; lBattaglia et alj2008h . 
Sculptor's populations have different veloc ity dispersion profiles , 
in addition to their distinct metalicities (Battaglia et al. l2008h . 
whereas Sextans has l ocalized kinematic a lly distinct population ei- 



ther near its center dKlevna et al 



near its core radius dWalker et alj|2006h . There are claims of two 



2004!; [Battaglia eial1 l201lh or 



populations with distinct velocity and metallicity dist ributions in 
the br ightest ultra-faint dwarf, Canes Venatici I (CVI) (Ibata et al. 
2006), but this is not s een in two other data sets dSimon & Gehal 
20071; IUraletalJl2010h . The Bootes I ultra-faint could also have 
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Table 1. Observed and derived properties of Ursa Minor. 
Parameter 



Value 



Distance 1 
Luminosity 1 
Core radius 1 
Tidal radius 1 
Half-light radius 1 
Deprojected half-light radius 1 
Average velocity dispersion 2 
Mean velocity 2 
Dynamical mass within n/2 1 
Mass-to-light ratio within r\ ji 1 
Ellipticity 3 
Center (J2000.0) 4 
Position angle 5 



77 ± 4 kpc 
3 X 10 5 L GiV 
' 17.9' ±2.1 
77.9' ± 8.9 
0.445 ± 0.044 kpc 
(1-1/2) 0.588 ± 0.058 kpc 

11.61 ±0.63 km s _1 
-247 km/s 
5.56+J}™ x 10 7 M G 
290!^°M G /L G 
0.56 ± 0.05 
(15 , '09"'10 I .2,67 o 12'52") 
49.4° 



Note: References are as follows l.lWolf eta! 



2. This paper 3. lMated dl998l) 4. lKlevna et ~ 



(2010) and references t herein 
2003h 5. 1Kievna et al.lfl998l) 



two kinematically dis tinct populations with different scale lengths 
dKoposov et alj|201ll). a l though this wasn't a pparent in earlier data 
sets dMufioz et alj|200rj ; iMartin et al]|2007l) . The largest of these 
Bootes I data sets contains 37 member stars and this has to be 
weighed against the results o f lUraletalJl201f3) who suggest that at 
least 100 stars are required to differentiate two populations. 

Among the classical dSphs, only Draco has a lower V-band 
luminosity but Ursa Mi nor is twice as extended as Draco (in terms 
of its half-light radius) dlrwin & H atzidimit rioull 19951 ; IPalma et"al] 
120031) . Its observed and derived properties are summarized in Ta- 
ble Q] Ursa Minor is also likely the most massive satellite in terms 
of its dark matter halo, apart from the Magellanic clouds and the 
disrupting Sagittarius dSph. These properties make Ursa Minor an 
ideal target to search for substructure. The V max at infall for the sub- 
halo hosting Ursa Minor should b e greater than 25 km/s but proba- 
bly no larger than about 50 km/s dBovlan-Kolchin et al.ll2012l) and 
thus we can expect Ursa Minor to have a sub-subhalo with V max 
in the range of 8 - 16km s -1 . Despite its low mass, such a small 
sub-subhalo could have held on to its gas because it was protected 
by the deeper potential well of Ursa Minor. 

Several photometric studies with different magnitude limits 
and overall extent observed, have reported additional localized 
stellar components of the stellar distribution that deviates from a 
smooth density profile ([Olszewski & Aaronsonlll985l; | Klevna et al] 



1998; Palma et al. 2003), parti cularl y near the center dDemers et al.l 
1995; Eskri dge & Schweitzerll200ll) . To the northwest of the cen- 



ter, a secondary pea k in the spatial distribu t ion is seen in con- 
tours and isopleths (( Irwin & Hatzidimitriou 1995; Klev na et al.l 
Il998l; IBellazzini et alj|2002l ; IPalma et alj|2003l) . However, differ- 
ent studies have conclu ded that this secondary peak is inconclusive 
or of l ow significance dlrwin & H atzidimitriou 1995; Klevn a et al.l 
1998; Bell azzini et al.l |2002| ; Palm a et al ] l2003h . Smaller scale 
stellar substructure is, however, seen with higher significance 
(Eskridge & Schweitzer 2001; Bellazzini et al. 2002). Combining 
proper motion information with s hallow photometric data in the 
central 20 arcmin of Ursa Minor, lEskridge & Schweitzetl d200ll) 
claim that the distribution of stars in Ursa Minor sh ows high signif- 
icance for substructure in clumps of ~ 3'0 in size. IBellazzini et aU 
J2002h used the presence of a secondary peak in the distribution of 
the distance to the 200th neighboring star to argue that the surface 
density profile of Ursa Minor is not smooth. In addition, the stel- 
lar density is not symmetric along the major axis with the density 



fallin g more rapidly on t he Western side ([Eskridge & Schweitzetl 
l200ll ; IPalmaet aljEoolh Statistically significant S-shaped mor- 
phology is also s een in contours of the red giant branch stars 
dPalma et al]|2003ri . 



Spectroscopic studies of Ursa Minor iHargreaves et al.|fl994 
Armandroff etalll 19951 ; iKlevna et al.ll2003l ; IWilkinson et alj|2004 



Muno z~lj|2005l)~ have sho wn a relatively flat v elocity dispersion 
profile of a * 8 - 12km s 1 . IKlevna etai]d2003l) (K03) used a two 
component model to test whether the second peak in photometry 
was a real feature. They found a second kinematically distinct pop- 
ulation with cr = 0.5km s and Av = -1km s . Our results lends 
support to this discovery by K03 but we do not agree on the magni- 
tude of the velocity dispersion of the substructure. We discuss this 
in greater detail later. 

K03 argued through numerical simulations that the stellar 
clump they discovered could survive if the dark matter halo of Ursa 
Minor had a large core (about 0.85 kpc) but not a cusp like the 
prediction for inner parts of halos of 1/r from CDM simulations 
dNavarro et alJI 1997b . Similar numerical simulations in cluding the 
Ursa Minor stellar clump have confirmed this result dLora et all 
l2012h . Similar conclusions have been reached using the observed 
projec ted spatial distribution of t he five globular clusters in Fornax 
dSph (Mackev & Gilmore 2003). The survival of these old globu- 
lar clusters has been interpreted as evidence that the dark matter 
halo of Fornax may have a large core in stark contrast to the pre- 
dictions of dark-matter-only CDM simulations dGoerdt et al J2006t 
ISanchez-Salcedo et al . 2006; ICowsik et al.l2009l ; ICole et alj2012l) . 
Thus, the study of the properties of the substructure in Ursa Minor 
has far reaching implications for the dark matter halo of this dSph 
and by extension the properties of the dark matter particle. Our 
study is complementary to the recent studies using the presence of 
multiple stellar populations in Fornax and Sculptor that also seem 
to point towards a cored dark mat t er density profile {gattaglia et al.l 
120081 ; IWalker & Penarrubi j|201 ll ; lAmorisco & Evansll2012h . 

Current methods for finding kinematic substructure in the 
dSphs has relied on likelihood c omparison parameter tests 
Klevna et all 120031 ; lUral et all l201fj|). non-parametric Nadaraya- 

or metalicity cuts and 



Watson esti mator d Walker et al 
kinematics 



(Battaglia et al 



2006), 



20111) . but not Bayesian methods. 



iHobson & McLachlanl 1 200: ) presented a Bayesian method for 



finding objects in noisy data. The object detection method is able 
to find two or more objects using only a two component model in 
photometric data. This method can be extended to include spec- 
troscopic line-of-sight velocity data to search for objects using 
kinematics, as well as structural properties. We extend and ap- 
ply this method to Ursa Minor to search for stell ar substructure 
dlrwin & Hatzidimitriou] 1 19951 ; IKlevna et ail 1 19981) and the kine- 
matically cold feature found by K03. 

1.1 Data and Motivation for more Complex Models 

The spectroscopic d ata used contains 212 Ursa Minor member stars 
(Munoz et al. 2005); the sample that K03 used to discover the cold 
feature contained 134 stars. Figure Q](left) shows the radial veloc- 
ities binned with the best fit single component Gaussian: this is a 
reasonable fit. The data are, however, fit better if we use a three 
component Gaussian model, cf., Figure [T] (right). The mean and 
dispersion of these Gaussian distributions were derived from our 
Bayesian object detection that is the subject of this paper. As a pre- 
lude to our final results, we note that the centers of all three popu- 
lations (the primary and two secondaries) found through the object 
detection method are spatially segregated. 
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Figure 1. The binned line-of-sight velocity data (red dashed) in Ursa Minor. Right: Over-plotted is the most probable Gaussian with cr = 11.51 and an 
V = -247.25 (black solid) from the null model (single Gaussian component). Left: The line-of-sight velocity distributions of the secondary objects and 
primary populations. The lines correspond to the velocity dispersions of different populations found with the Bayesian object detection method; velocity offset 
object (blue dot-dot-space), cold object (green dotted), primary distribution (purple dot-dash), and the total (black solid). Each component is weighted by its 
average number of stars found using the Bayesian object detection method. The additional kinematic components provide a better fit to the Ursa Minor data. 



Before we develop the Bayesian methodology, we would like 
to dissect the data to see if secondary populations are visible as 
strong local deviations in either mean velocity or velocity disper- 
sion. To this end, we grid a 50' x 30' region around the center 
of Ursa Minor finely and for each grid point, we find the aver- 
age velocity v and velocity dispersion a in a 5' x 5' bin using the 
ex pectation-maximizat ion (EM) method (see Equations 12b and 13 
of lwalker et al. (2009b)). We disregard grid points where there are 
fewer than 7 stars in the bin. We have plotted the smoothed cr and 
v maps created using this method in Figure [2] The velocity disper- 
sion map is the upper left panel and the average velocity map is 
the upper right panel. The data is rotated such that the major axis 
is aligned with the abscissa (9 = 49.4°, see Table 1 for the photo- 
metric properties of Ursa Minor we use). There are two interesting 
features evident: in the cr map, roughly centered at (1 1', -4'), cr is 
significantly lower than the rest of the galaxy (cr < 6km s _1 ), and 
in the v map centered at (-13', 6'), the v significantly differs from 
Ursa Minor's overall average (A|v| > 10 km s~'). For reference, the 
entire data set has cr = 1 1.5 km s and v = -247.2 km s _1 with the 
EM method and cr = 1 1.6 + 0.6 km s" 1 andv = -247.2 ±0.8 km s" 1 
using a single component Gaussian model sampled with a Bayesian 
nested sampling technique (see next section for an explanation of 
the Bayesian methods we use). We have also plotted the number 
density (lower left panel) and the positions of the stars (lower right 
panel) in Figure|2]to provide a sense for where the data is and how 
significant the features in the v and cr maps are. The number den- 
sity map is created the same way as the v and cr maps and it shows 
that both features are in regions that are reasonably sampled. In the 
plot with the positions of the stars, we have also indicated the most 
probable locations for the centers and the extent of the the two fea- 
tures as found by our Bayesian object detection method. We caution 
the reader that the plotted extents (tidal radii) of the these features 
have large error bars see Table l2.lt . 

The center of the dip in the velocity dispersion (upper left 
panel of Figure |2j is near the spectroscopic feature found by 
K03 and the se condary density peak seen in the photometry by 
several authors {i rwin & Hatzidimitriou 1995; Klevn a~et alJI 19981 : 
Bellazzini et al .120021 : Palmaetal .120031) . The average velocity fea- 
ture we see does not correspond to any previous noted photometry 



or kinematic features. However, we note that the stell ar isodensity 
contours of Ursa Minor are significantly asymmetric ( Kle vna et al.l 
ll998l:IPalma etal.ll2003l) and could hide both features. 

Here we aim to show that these two localized kinematic fea- 
tures in Ursa Minor are statistically significant. We now turn to 
describing our Bayesian object detection method for finding sec- 
ondary objects and model selection methods for assessing their sig- 
nificance. 



2 METHODOLOGY: THEORY 

This paper has two primary objectives: to present a statistical 
methodology for detecting discrete features within a kinematic data 
set and apply this methodology to the Milky Way satellite galaxy 
Ursa Minor. In this section we detail the statistical techniques used 
to detect kinematic objects within the Ursa Minor data set. The per- 
tinent question we are addressing is whether statistically distinct 
kinematic objects can be detected within a galaxy's stellar line-of- 
sight kinematic data and, if such an object is detected, how certain 
can we be that this object is an actual physical attribute of the sys- 
tem. Thus we require that any methodology used to detect multiple 
smaller composite objects within the kinematic data set have two 
important properties. First, any proposed algorithm must be able to 
discern an unspecified numbers of statistically separable features 
within the a galaxy's kinematic data set. And second, this method- 
ology must allow for some kind of determination of the significance 
of a proposed object detection. 

To meet these criteria, we employ a Bayesian object d etection 
technique first introduced bv lHobson & McLachlanl J2003I) . In our 
implementation, the data distribution is modeled with two separate 
components: a background distribution referred to as the primary 
distribution, in our case, the Ursa Minor dSph CP,,), and a 'sec- 
ondary' distribution (P s ) which is interpreted here as a feature or 
object of the Ursa Minor data set. Thus, the actual distribution is of 
the form: 

P(di\JC) = (1 - FyP p (di\Jlf p ) + FV s (d\Jt s ) (1) 

where F is the total fraction of stars in the secondary population, 
dj represents an individual element of the Usra Minor data set @ 
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Figure 2. The local kinematics of Ursa Minor using the lMufioz et~aT] {2005) data set. Upper Left: A map of the velocity dispersion of Ursa Minor. A portion 
of the lower right quadrant drops below 6 km s while the rest of the galaxy is relatively uniform. Upper Right: The average velocity of Ursa Minor found 
concurrently with the velocity dispersion. In the upper left quadrant the deviation Av > 10 - 15 km s relative to Ursa Minor while the rest of the galaxy 
does not differ more than 5 km s - ' . To make the contour plots, the velocity disp ersion and the averag e velocity were found within a 5' X 5' bin (5' = 1 10 pc 
for a distance of 77 kpc). Lower Left: The stellar density profile of the stars in the Mufioz et al. 1 2005) data set. Lower Right: The most probable locations and 
sizes (tidal radii) of the two objects using the Bayesian object detection method in Ursa Minor. Both of these locations correspond to the deviations seen in 
the average velocit y and velo city dispersion maps. The coordinate system used here is such that the x-axis lines up with the major axis which has a position 
angle of 49.4° fclevna et alii 1998b . The adopted center for Ursa Minor was RA = I5 h 09 m 10 s .2, DEC = +67° 12'52" (J2000.0) (K03). For the entire sample, 
we obtain a mean velocity v = -247.25 km s -1 and velocity dispersion tr = 11.51 km s _1 . 



(2> = {dj}), and ^ denotes the parameter set of the respective 
distribution's model. A major benefit of this type of analysis is that 
data sets with multiple features will cause the secondary population 
parameter posteriors to become multi-modal where each individ- 
ual mode represents a unique feature. This enables us to search for 
an arbitrary number of objects without requiring an overly compli- 
cated probability distribution. In addition, the local Bayesian ev- 
idences of each mode can be used as a selection criterion. The 
evidence Z = P(S1\H) is equal to the integral of the product of 
the likelihood, = <P(3>\Jt,H) = UiP(di\^,H), and prior 

probability, Pr(JZ) = V(Jf,\H): 

Z= f L(.Jt~)Pr{JC)dJ( . (2) 

Here, the probability density of the parameter set ^( (i.e., 
P(.J?\£}, H)), or posterior, is related to the evidence by the Bayes' 
theorem 

P(^,m = p ^f Pri -« ) , ( 3) 



Later, we use the evidence as a criterion for selecting between two 
models, or hypotheses (H): One that assumes a 'secondary' fea- 
ture represented by equation [T](/fi) and another 'null hypothesis' 
that only assumes the background distribution V p (Hq). In section 
12.21 we use this both directly in the ratio of evidences, or Bayes 
factor, and indirectly in the determination of the the Kullback- 
Leibler divergence, a quantity the quantifies the amount of infor- 
mation gained from the assumption of one hypothesis over another. 
Through a large set of Monte Carlo simulations, these criteria are 
then used to derive confidence levels on the exclusion of the null 
hypothesis. 



Calculation of the above quantities and sampling of the poste- 
rior space was done utilizing a Ba yesian nested sampling technique 
( Skilling 2004 ; Feroz et al. 2009). The reason for this choice is that 
this sampling algorithm possesses all the capabilities required for 
this project: multi-modal posteriors can be explored efficiently, and 
the evidence is inherently evaluated. 
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2.1 Likelihood 

Our methodology utilizes a two component probability dis t ributio n 
similar to that in the K03 paper (also see Marti nez et al 
We base the 'primary' (p) and 'secondary' (s) probability distri- 
butions on a Gaussian with mean velocity v P(S , using the velocity 
errors e,, and the assumption of a constant velocity dispersion, cr p „ 
as the spread: 

<p t pi// ^ exp ~-^1,M)\ppM) 

Pp, s {v u Ri\Jf p J) = — - — ( 4 ) 



N P ,s 



Here, p p , s (R) is the 2-d stellar number density normalized to the 
total number in the population (N ps ). 

Unfortunately, because of spatial selection biases, p PiS (R) is 
difficult to model. To account fo r this uncertainty, we c onsider only 
the 'conditional' likelihood (see Mart inez et al.l j2oTlh for details): 

P P M R u ■*) = PpAVi, Ri\^)/(p P A R i)/Np, s ). (5) 
With this, equation[T]becomes: 

p{v,\r„ jo = a - rmyppfym, + /(wo^, -#r.) (6) 

where f(Rj) is now the 'local' fraction of stars in the secondary 
population defined by 



f(Rd : 



pAR.l^s) 



p s (Ri\Jt s ) + up,AR\J{p) 



(7) 



Here, we have introduced the variable a = N s /N p . Instead of vary- 
ing a directly, we found that, in some instances, using total fraction 
as a free parameter simplifies the analysis: 



jp s AxAy 



Jp s dxdy + a jp p dxdy 



(8) 



For the primary population, we assume a king 2-d density pro- 
file whose parameters are fixed to the observed photometry. The 
secondary object's density profile is taken to be a top-hay. Our 
Bayesian object detection model constituted of 8 parameters: 2 pri- 
mary kinematic parameters, 2 secondary kinematic parameters, the 
x and y center and tidal radius for the secondary population and the 
total fraction. The parameters, priors, and posteriors are listed in 
the first row of Tablel2. ll 



2.2 Model Selection 

Even with accurate probability density modeling and thorough pa- 
rameter space exploration, any object detection methodology will 
have fairly limited capabilities if the significance of a detection 
cannot be determined. In our method, we use several model se- 
lection techniques to assess the significance of finding such an ob- 
ject. Here, the posterior, likelihood, and evidence are used as the 
basis for determining selection criteria that measure the suitabil- 
ity of an hypotheses. The two hypotheses that are compared are a 
model that contains no sub-component feature (the 'null hypothe- 
ses' (H )) and a model containing a sub-population (Hi). Model 
selection techniques generally fall into two categories: those de- 
rived from the Bayesian evidence, and those based on information 



1 Other profiles were tried including a King, and Plummer profile. We de- 
tected both objects in all cases. The scale radii for the stellar profiles were 
unconstrained and errors were higher in other cases. 



theory (specifically the Kullback-Leibler divergence (D KL ) or in- 
formation entropy). Among the most common are the Bayes fac- 
tor, the Bayesian info rmation crit erion (BIC), the Akaike informa- 
tion criterion (AIC) ( Akaik ej|l974h. the Deviance information cri- 
terion (DIC) JSpiegelhalter et al.l 120021), and direct calculation o f 
the Kullback-Leibler divergence (D KL ) jKullback & Leiblejl95lh . 
(For a review and the use of information criterion in cosmology see 
iLiddld J2007h . for more general reviews of of model select i on par - 
ticular l y Bay esian methods in cosmology see iLiddle et alj d2006h : 
Trotta ( 120081) .) In this paper we use the Bayes Factor, DIC, and 
D KL to quantitatively derive confidence levels. We do not discuss 
the AIC or BIC since they are Gaussian approximations of the evi- 
dence and D KL respectively. 

The Bayes factor is the ratio of the evidence of two models or 
hypotheses. For example, the Bayes factor between two hypothe- 
ses Hq and Hi , or single component versus multiple components is 
defined to be 



gQglffl) 



(9) 



The general rule of thumb is that Boi > 1 favors hypothesis H l 
and B 0I < 1 favors hypothesis H . The significance of B m is 
usually computed as lnSoi with lnSoi < 1, 1 < lnfioi < 2.5, 
2.5 < lnfioi < 5, lnSoi > 5 corresponding to inconclusive, weak, 
moderate and strong evidence, respectively, in favor of hypothesis 
H\ . The Bayes factor has the advantage that it is an output of our 
sampling algorithm. But, the main disadvantage is that the Bayes 
factor inherently penalizes the model whose parameter space has 
the larger degrees of freedom. This can make determination of the 
significance of a detection ambiguous in that the Bayes factor will 
naturally underestimate the importance of a proposed detection. 
We address this issue by first utilizing additional selection crite- 
ria based on information theory and second, null hypothesis mock 
data set analyses. 

As mentioned in the previous paragraph, we wish to supple- 
ment the Bayes factor with other selection criteria based on infor- 
mation theory. Typically, these criteria are derived from D KL that 
quantifies how much more information you gain by switching from 
one probability distribution to another. For our case, this quantity 



J w. 



Hi) 



H ) 



f>(Ji(\®,H\)d^ 



(10) 



where P , V\ are the posteriors under hypotheses H and Hu re- 
spectively. Another quantity, the DIC ( Spiegelhalter et al. 20021) . is 
related to the amount of information gained through the full poste- 
rior as opposed to assuming only the prior probability distribution 
(i.e.,ZWP,Pr)): 



DIC=-2D KL (P,Pt) + 2C„ 



(11) 



whereCi = xHJ Q - x (^), X = -21n(£), and D KL (P,-?r) = 
ln(£(^#)) - ln(Z) jTrotdl2008n . We emphasize that the evidence 
or Bayes factor and D KL should be used over the traditional in- 
formation criterion whenever possible. We also introduce the total 
membership as a physically interpretable model selection method 
tailored for the problem at hand. The membership that a star is part 
of the secondary population is derived from the pos terior by the ra- 
tio of the secondary likelihood to total likelihood dMartinez et al.l 
l201ll) . For the ith star, the membership is: 



/(RdPAvdRi,^) 



(1 - f(R,W P {v,\R„ Jtp) + myPAviWi, -C) 



(12) 
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Parameter 


Type 


Prior (Units) 


Cold Spot 


Velocity Offset 


Model parameters from Bayesian object detection method 


OS 
(T p 
v 5 
~p 

x cen 
ycen 

rndai 

Ftotal 


flat 

flat 

flat 

flat 

flat 

flat 
flat in log 10 
flat in log 10 


Cuts 1/2 (see caption) 

to 20 km s" 1 
Cuts 1/2 (see caption) 
-242 to -252 km s" 1 
-0.6 to 0.6 kpc 
-0.4 to 0.4 kpc 
10 to 300 (pc) 
10~ 5 to 1 


3 5+ 175 
J -2.25 

11.75 ±0.5 

-246.75+''^ 
-247.5 ± 0/75 
25+ 004 
-0 07+°* 

"'"'-0,07 

151+11 
79+° 21 


8 - 75 -225 

10.75 ± 0.5 

-258.75+ 2 '^ 5 

-245.25 ± 0.75 

-0.24 ± 0.09 

0.23 ± 0.02 

251+24 

32 +6A1 
u - -0.26 


Secondary Population Model Parameters from simultaneous 3-component modeling 


x cen 
ycen 
rtidal 

OS 

<7> 

v s 

f local 


flat 
flat 
flat in log 10 
flat 
flat 
flat 
flat 
derived 


-0.24 ±0.1 kpc 
0.23 ±0.1 kpc 
10 to 300 pc 
Cuts 1/2 (see caption) 

to 20 km s" 1 
Cuts 1/2 (see caption) 
-252 to -242 km s -1 


0.26 ± 0.02 
-0.07 ± 0.01 

151 + 151 

4.25 ± 0.75 
11.5 ±0.5 
-246.25 ± 1.0 
-245.25+°f 
70% (15.8/22.5) 


n no+0.095 
" -0.035 

0.22 ± 0.02 

269+ 26 
9.25 ± 1.25 
11.5 ±0.5 
-258.0 ± 1.5 
-245.25+5J- 75 
85 % (27.0/31.6) 



Table 2. Parameters, Priors, and Posteriors. <x, and cr,, are the velocity dispersions of the secondary and primary populations, v, and v p are the average 
velocities of the secondary and primary populations. x cen and y cen refer to the x and y centers of the secondary population. Note that the data was rotated such 
that the x axis and the major axis are parallel. r„,/ a / is the tidal radius in a top hat model for the secondary population. F ma { is the ratio of stars in the secondary 
population to the total population. For the first section, the 4th and 5th columns denote the values when detecting the two objects individually. The two cuts 
indicated in the table as "Cuts 1 and 2" are defined as follows. Cut 1 is < cr < 10km s -1 and -252 ^ v < —242km s -1 to find the cold spot object. Cut 2 is 
< it < 20km s and -267 < v < —237km s _1 to find the velocity offset object. In the second section, the 4th and 5th column denote the values calculated 
for the two objects simultaneously using a 3-component model. The coordinates x cen and y cm of the objects were only allowed to vary within ±0. lkpc of the 
value obtained from the Bayesian object detection method. fiocal is the weighted average fraction of secondary population stars in each secondary object's 
location. 



As the membership is derived from the posterior, each star will have 
its own probability distribution. Our data set contains 212 stars and 
so to simplify the analysis we use the average membership of each 
star's probability distribution. A global model selection parameter, 
the total average membership, can be found and interpreted as the 
average number of stars contained in the secondary population. We 
find (see Figure [3l4l l that the membership correlates with each of 
the other model selection parameters (i.e., a model with high evi- 
dence will have high membership and a model with low evidence 
will have low membership). 

2.3 Testing the Method with Mock Data 

We created 100 mock data sets containing a second population 
to test whether known secondary objects could be detected using 
our object detection method. The second populations were located 
at either (0.2, -0.1) or (-0.23, 0.24) kpc (roughly the locations of 
the cold and velocity offset objects). The kinematic and structural 
parameters of this second population were selected to mimic the 
cold and velocity offset objects. The positions and velocity errors 
from the Ursa Minor data set were used to simulate observational 
errors. To pick which population a star is assigned to, the local 
fraction was found via Equation [8] and membership was randomly 
assigned with the second population weighted by the local frac- 
tion. The primary population parameters were the best fit values 
from Ursa Minor photometry and the kinematics of the entire sam- 
ple: r t idai = 1.745 kpc, r core = 0.401 kpc, ellipticity e p = 0.56, 
cr = 11.5 kms~', and v = -247 kms~'. The second popula- 
tion's base parameters were: e s = 0, 6 S = 0.0, F tota i = 60/212, 
r core = 0.05kpc, AVj = kms~', cr = 4km s -1 , r, iial = 0. 15kpc 



for (0.2, -0.1) location. For the (-0.23, 0.24) location, we used a 
slightly larger value for tidal radius, r tidal = 0.25kpc. We note that 
both populations were created assuming an underlying King pro- 
file but the object detection used a top-hat model when finding the 
second population, identically to how the objects were found in the 
actual data. Each individual mock data set had 1-3 secondary pa- 
rameters that deviated from the base parameters to test how each 
parameter effected the detection. In some sets we did not expect to 
find the secondary population, for example, if they had small tidal 
radius or small secondary population fraction. 

The results for model selection of the D KL , DIC, ln6 01 , and 
total membership using two different kinematic priors are summa- 
rized in the right and middle columns of Figure [3] (secondary pop- 
ulation located at (0.2, -0.1)) and Figure [4] (secondary population 
located at (-0.23, 0.24)). The left and middle columns show dif- 
ferent kinematic priors with the left column showing the cuts to 
find kinematically cold objects (0 < cr < 10km s _1 , -252 < v < 
-242km s -1 ). The middle has the cuts to find objects with a signif- 
icant velocity offset (0 < cr < 20km s -1 , -267 < v < -237km s -1 ); 
this cut will also find the kinematically cold objects, but in the 
Ursa Minor case the velocity offset object was significantly more 
likely and tended to dominate the posterior. The symbols for these 
columns are labeled/colored according to a by-eye definition of the 
x and y posterior: peaked/"found" (red square), not peaked/"not 
found" (green x), "possible" peaks (blue triangle), double peaked 
with one correct (light blue diamond). Results for the actual Ursa 
Minor data with corresponding cuts are shown as filled black circle. 
The "possible" peaks are posteriors where there was a peak near the 
second population's center, a small/medium peak somewhere else 
in the posterior, or a small peak at the correct location. The dou- 
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Figure 3. Model selection tests using Dkl, DIC, log/JF = lnBoi (cf., 82,21 for definitions) for 50 mock data sets located at (0.2, -0.1). Also shown for 
comparison are the results for the actual Ursa Minor data set. A more negative DIC favors the secondary object hypothesis more strongly, while the same is 
true for larger values of Dkl and Bayes factor. Left column: Figures in column 1 show the results of the analysis of the mock data sets in exactly the same way 
as the real data set was analyzed to look for the cold object with cuts on mean velocity and dispersion given by ^ it ^ 10km s~' and -252 < v < -242km s _1 
(Cut 1). The top panel shows Dkl, the middle panel DIC and the bottom panel the logarithm of the Bayes factor (written in the text as lnBoi- Mock data sets 
that had second populations with significant differences in their kinematics with respect to the background population were found with our object detection 
method. The symbols are labeled/colored according to a by-eye classification of the x and y posterior: peaked/found (red square), not peaked/ not found (green 
x), possible peaks (blue triangle) and double peaked with one correct (light blue diamond). The results for the actual Ursa Minor data set is shown as filled 
black circle. Middle column: This panel has the same symbols and colors as the left most column. The difference here is that the velocity cuts used are broader 
(and the same as that used to find the velocity offset object). The cuts are ^ a ^ 20km s and -267 ^ v s? -237km s~' (Cut 2). Right column: Histograms 
of Dkl, DIC and Bayes factor from analyses of 1000 null hypothesis mock data sets with Cut 1 (red dotted) and Cut 2 (blue solid). The vertical lines show the 
Dkl, DIC and Bayes factor values (in the top, middle and bottom panels, respectively) for the actual Ursa Minor data set with Cut 1 (green dotted) and Cut 2 
(magenta dot-dashed). 



ble peaked data had one peak at the correct location and a second 
at another location. The "possible" sets tended to span the border 
between "found" and "not found" and were not easily categorized 
otherwise. 

Both Figures show a clear trend between the "found" and "not 
found" sets in all the model selection methods. Note that more neg- 
ative DIC corresponds to favoring the more complicated model. 
Sets that are "not found" by-eye have model selection criteria that 
is equivalent to the model selection criteria of null hypothesis mock 
data sets (i.e., sets made with no second population), cf., Sec- 
tion |3T| The model selection criteria for the two objects found in 
Ursa Minor also lie in the "found" section of the mock data's selec- 
tion criteria. From the analysis of these mock data sets we conclude 
that our method is fully capable of detecting the cold and velocity 
offset objects, and the model selection criteria favor the favor pres- 
ence of two additional components in Ursa Minor. 



3 RESULTS 

We have found t wo objects in the Ursa Minor data set of 
Munoz etal]d2005l) using a Bayesian object detection method. The 
first object, referred to as the "cold object" here, is kinematically 
cold, a coU = 3.5*23!™ s -1 , with an average velocity close to that 
of the full Ursa Minor sample, v co « = -246.8* 2 Qkm s _I . The loca- 
tion coincides with the location of the K03 stellar clump. The sec- 
ond object, referred to as the velocity offset object, has a large av- 
erage velocity offset compared to the mean velocity of Ursa Minor, 
v vo = -258.8^ gkm s _I with a dispersion of cr vo = 8.8*Hkms -1 . 
The kinematics and structural properties are summarized in the first 
section of Table 12.11 The model selection tests for the cold ob- 
ject are: Total Membership = 15.8, D KL = 4.8, DIC = -26.1, 
In B i = 0.9. The model selection tests for the velocity offsets ob- 
ject are: Total Membership = 27.0, D KL = 13.9, DIC = -36.5, 
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Figure 4. Model selection tests using Dkl, DIC, lnBoi for 50 mock sets located at (-0.24, 0.23) and the Ursa Minor data. The layout is the same as Figure[3] 
The third column from left displays the results from the scrambled mock data sets instead of the null hypothesis mock data sets plotted in Figure[3] 



InBoi = 3.6. In Figures [3} 0] the results of model selection test 
are plotted along side the mock set distributions. All of the model 
selections tests favor the additional secondary objects with mod- 
erate to high significance except for the Bayes factor which has 
weak to moderate significance for the cold and velocity off set ob- 
jects. T his significance is based on the recommendati ons of lTrottal 
j2008h : lGhosh et al.U2006r ): ISpiegeihalter et al.l ( 120021) . However, it 
is important to judge the significance of the information criteria and 
the Bayes factor for the problem at hand. We do this by generating 
mock data sets and deriving the information criteria and Bayes fac- 
tor in the same way as the real data is handled. When this test is 
performed, we find that the confidence levels of both objects are 
above the 98% C. L. (see Table |2~TT >. In addition, all of the model 
selection values, for both locations/objects, lie in the "found" re- 
gion of the mock sets of Figure [3l4l 



3.1 Significance of Information Criteria and Bayes' Factor 

In order to assess the significance of the model selection tests, 
knowledge of the false positive rate is helpful. We make use of two 
types of tests: null hypothesis mock data sets and scrambled data 
sets. Null hypothesis mock data sets are constructed by redrawing 
the line-of-sight velocities from a Gaussian with Ursa Minor kine- 



matic^. To simulate positional and velocity errors, the positions of 
stars and the line-of-sight velocity errors were kept. The scrambled 
sets were constructed by repicking a random observed line-of-sight 
velocity and line-of-sight velocity error pair, without replacement, 
for each star in the data set. 1000 null hypothesis mock data sets 
and scrambled data sets were constructed and analyzed with our 
object detection method. 

The results of the object detection method and our employed 
model selection tests for the null hypothesis mock data sets and the 
scrambled mock data sets are shown in the last columns of Fig- 
ures[3]and[4] respectively. The D KL (top), DIC (middle), and lnfioi 
(bottom) are binned and the maximum is normalized to unity. The 
analysis with the cuts to find cold objects (0 < cr ^ 10km s~', 
-252 < v < -242km s~') is shown in red, while that with cuts 
to find objects with significant velocity offset (0 < cr < 20km s , 
-267 < v < -237km s~') is shown in blue. The model selection re- 
sults for the real Ursa Minor data are plotted as vertical lines: cold 
object with green dotted line and velocity offset object with purple 
dash-dot line. The confidence levels of the model selection criteria 
for the null hypothesis mock data sets and scrambled data sets are 
above the 98.5% c. 1. with every model selection criteria. They are 
summarized in Table 13. ll Even though the lnB i shows weak evi- 
dence for the cold object according to standard definitions, it is still 

2 We used v = -247.0km s _1 and cr = 1 1.5km s -1 . 
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Figure 5. The posteriors for the secondary populations in Ursa Minor using the three-parameter model. The secondary populations are fixed at (0.25, -0.07)kpc 
and (+0.24, 0.23)kpc and allowed to vary O.lkpc in both coordinates. They correspond to the cold (black solid) and the velocity offset (red dots) objects, 
respectively. Upper Left: The x coordinate posteriors for of the secondary populations. Upper Right: The y coordinate posteriors for the secondary populations. 
Lower Left: The velocity dispersion posteriors of the cold object (black solid), velocity offset object (red dotted), and the primary (blue dashed). Lower Right: 
The average velocity posteriors of the cold object (black solid), velocity offset object (red dotted), and the primary (blue dashed). The secondary populations 
have distinct kinematic properties and are both localized. 



above the 95% confidence level for both the null hypothesis mock 
data sets and scrambled data sets. 

3.2 Narrowing down secondary population parameters using 
a 3-component model 

To reliably calculate the kinematic properties of the secondary ob- 
jects we introduce a model with two secondary populations. The 
additional populations are only allowed to vary by 0. 1 kpc in both 
x and y from the best-fit center locations found in the Bayesian 
object detection method for the cold and velocity offset objects. 
Equation [7] is changed to include the third component and instead 
of the normalization parameter, a = there are now two nor- 
malization parameters, a 2 = j^, and a p = ^ where Ni and iV 2 
denote the normalization of the first and second object. The re- 
sults for the kinematic parameters are: cr coU = 4.3 ± 0.8km s" 1 , 
\ cM = -246.3 ± 1.0km s" 1 , cr vo = 9.3 ± 1.3km s" 1 , and v vo = 
-258.0 ± 1.5km s" 1 , respectively. These values are in full agree- 
ment with the values obtained using the two-component (Bayesian 
object detection) method. 

The normalization ratios, as defined, are not easily interpreted. 
So we introduce a derived parameter, local fraction or fi oca i, that is 



defined as the weighted average of stars with memberships greater 
than 50% in the secondary population compared to the total number 
of stars within the secondary object's tidal radius. In short, it is a 
measure of the fraction of secondary stars in each object's location. 
We derive fi ac al,cold = 15.8/22.5 or 70% and fuxal,vo = 27.0/31.6 
or 85%. Clearly, we are able to find these objects only because 
they seem to have a high local fraction. The kinematics and struc- 
tural properties of the secondary population model are summarized 
in the second section of Table 12.11 In upper left and right panels 
of Figure [5] we have plotted the posteriors for the x and y cen- 
ters, respectively, for the cold (black solid) and velocity offset ob- 
jects (red dotted). The centers for the cold and velocity offset ob- 
ject are (0.25, -0.07) kpc and (+0.24, 0.23) kpc and the two panels 
show the deviation from the "fixed" centers. The lower right (lower 
left) panel of Figure [5] is the posterior of the cr s (v s ) for the cold 
(black solid), velocity offset objects (red dotted), and primary (blue 
dashed). 

An increased prior volume for the centers and tidal radius in 
the 3-component model changes the posteriors for the structural 
parameters of the velocity offset object but does not changes its 
kinematics. By only allowing more freedom in the location of the 
centers (200 pc versus 100 pc) the posteriors of both centers gain 
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Test using null hypothesis mock data sets 





Total Average 


Information Entropy 


Bayesian Evidence 




Membership 


Dkl 


DIC 


lnfioi 


Value at 95% C.L. from null hypothesis mock data sets using Cut 1 


5.25 


1.28 


-16.35 


0.17 


Cold object values from data (inferred C. L.) 


15.82 (99.8%) 


4.82 (99.7%) 


-26.08 (99.5%) 


0.87 (99.7%) 


Value at 95% C.L. from null hypothesis mock data sets using Cut 2 


4.49 


1.84 


-17.79 


0.13 


Velocity offset object values from data (inferred C.L.) 


27.02 (> 99.9 %) 


13.93 (> 99.9 %) 


-36.49 (99.9 %) 


3.59 (> 99.9 %) 



Test using scrambled data sets 





Total Average 


Information Entropy 


Bayesian Evidence 




Membership 


Dkl 


DIC 


lnfioi 


Value at 95% C.L. from scrambled mock data sets using Cut 1 


6.99 


2.22 


-20.45 


0.40 


Cold object values from data (inferred C. L.) 


15.82 (99.7%) 


4.82 (99.1%) 


-26.08 (98.5%) 


0.87 (99.0%) 


Value at 95% C.L. from scrambled mock data sets using Cut 2 


3.89 


1.46 


-16.30 


0.07 


Velocity offset object values from data (inferred C.L.) 


27.02 (> 99.9 %) 


13.93 (> 99.9 %) 


-36.49 (> 99.9 %) 


3.59 (> 99.9 %) 



Table 3. Confidence Levels computed from null hypothesis and scrambled mock data sets. The inferred C.L.. refers to the number of null hypothesis mock 
data sets and scrambled data sets sets that have a model selection value lower than that of the actual Ursa Minor data. The 95% C.L. value is defined such 
that 95% of the null hypothesis or scrambled data sets have a value below this. Both additional populations found in the Ursa Minor data are above the 98% 
C.L. for all the model selection methods. The two cuts indicated in the table as "Cuts 1 and 2" are defined as follows. Cut 1 is < cr < 10km s -1 and 
-252 < v < -242km s -1 used to find the cold spot object in the data. Cut 2 is < cr < 20km s -1 and -267 < v < -237km s -1 used to find the velocity offset 
object in the data. 



tails. An increase in the maximum tidal radius (in the prior) of the 
objects (500 pc from 300 pc) increases the size of the velocity offset 
object and moves its center roughly 150 pc away from the center 
of Ursa Minor while the same change introduces tails in the pos- 
terior of the cold object. Given these results, it is fair to say that 
the the size and center of the secondary objects are not known with 
precision and more data will help considerably. However, our con- 
clusions regarding kinematics seem to be robust. 

3.3 Perspective Motion 

Line-of-sight velocity measurements for the Milky Way satellites 
receive a small contribution from x and y direction velocities of 
the star (where z is along the line-of-sight to the center of the 
gal axy), and this contribution increases with dista nce from the cen- 
ter fceast et al]|l96ll :rKaplinghat lfcStrigarill2008l) . A similar con- 
tribution could also arise due to solid-body rotation or some other 
physical mechanism (such as tides) that leads to a velocity gradient. 
Motivated by the large velocity-offset we found, we ask whether 
the this term changes our conclusions. The observed line-of-sight 
velocity of a star may be written as, 

vios = v : - v x xl D - v y yl D (13) 

where D is the distance to the galaxy and (x,y) are the projected 
coordinates on the sky. This method has been applied to the dSph's 
Fornax, Sculptor, Sextans, a nd Carina and results agree with other 
methods dWalker et alj2008f) . The proper motion we find assuming 
only a primary population with a constant velocity dispersion is 
(Ma, Us) = (529 ± 848, -280 ± 449) mas century -1 , which shows 
clearly that we are unable to constrain the proper motion of Ursa 
Minor using this effect. 

Observations from the HST find a proper motion for Ursa Mi- 
nor o f (jU a ,yi a ) = (-50 ± 17,22 ± 16) mas century -1 l lPiatek et al.l 
2005), which is an order of magnitude smaller (when comparing 
the mean) than the result we calculate. If stars with high member- 
ship in the velocity offset object are weighted as not being in Ursa 



Minor the proper motion of this subset is (p. a ,Hs) = (1 17 ±90, 163 ± 
127) mas century -1 . Removing both secondary populations this 
way results in (fj a ,Lis) = (-84 ± 79,-185 ± 174) mas century -1 . 
If we remove all the stars in these locations we find (fj a ,Lis) = 
(-67 ± 60, -203 ±181) mas century -1 . These comparisons provide 
clear proof that it is hard to estimate the tangential velocity with 
perspective motion if there are secondary populations in the data 
set. 

To investigate this issue further we run a three-component 
model to try and pin down the two secondary components when 
including perspective motion. We add this effect into our likeli- 
hood function by changing the model velocity for all three com- 
ponents (cf, v p>s in Equation |4j to vi os- , given by Equation [T3] with 
Xj and y, for each star measured from the center of Ursa Minor. 
Each component has its own v z but v x and v y are the same for all 
three components. Note that the actual tangential velocity of the 
two secondary components is now implicitly tied to the v, value 
- there is no hope of disentangling them given the small projec- 
tion on the sky of the secondary components. We then impose the 
same constraints on the center as before (cf, 93. 2\ . We find results 
that are consistent with those we found in 33.21 in the absence of 
perspective motion: x cM = 0.245+Q^kpc, y coU = -0.065^ JJ'jkpc 
and x m = -0.275+°'^ 5 kpc, y m = 0.24 ± 0.025kpc. The kinematic 
properties are the same as without perspective motion except the 
error bars are larger. Thus the three-component model with the 
prior on the centers provides a different fit and favors the pres- 
ence of the secondary objects over perspective motion. Had per- 
spective motion or a velocity gradient or rotation been a better 
fit to the likelihood instead of either of the objects, this would 
not have been the case since the likelihood allows for the free- 
dom to dial down the fraction of stars in the secondary objects. 
In this three-component fit, the mean velocity of Ursa Minor is 
(-311 ± 212, -548^g,-245.5 ± 0.75) km s -1 , in good agreement 
with the results obtained when stars in the locations populated by 
the secondary populations are removed. 

Instead of using a three component model (as we did above), 
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we also explored the effect of using the Bayesian object detection 
method including the perspective motion effect. This could lead 
to faulty results (and we show below that it does) because the ve- 
locity offset spot has a large impact on the determination of the 
background parameters - specifically the perspective motion. With 
the velocity cuts to find the cold object, we find a mean velocity 
for Ursa Minor of (-100+™, -1 125+™, -247.5+^)km s~' and a 
dispersion in the line-of-sight velocity of 11.0 ± 0.5km s -1 . The 
dispersion of the cold object is now consistent with zero at about 
1-cr, 3.25 ± 3.0 km s and the location of the centers is now much 
less well-determined. However, the values obtained for the perspec- 
tive motion are unphysically large and hence this is clearly not 
the correct model to be considering. With the ±20kms~' veloc- 
ity cut (to find the velocity offset object), we find a mean veloc- 
ity for Ursa Minor of (-200+}'°, -1 175+ 4 ™ -247+} ° 5 )km s" 1 and 
10.75 ± 0.5 km s for its dispersion in the line-of-sight veloc- 
ity. The center, as with the other object, is no longer tightly con- 
strained, and the hint for deviation in mean velocity for this ob- 
ject is muted (-258+^| km s ). Thus, we arrive at the conclusion 
(unsurprisingly) that varying background parameters in Bayesian 
object detection methods can lead to faulty results in data sets con- 
taining multiple signals if those signals have a significant effect on 
the determination of the background parameters. In particular, for 
this analysis we saw that the presence of the velocity offset spot 
affects the magnitude and the direction of the inferred tangential 
motion and hence the object detection method has trouble fitting 
one secondary location and perspective motion. But with two lo- 
calized secondary populations and perspective motion the method 
still picks out both secondary objects. Thus, the three component 
model is preferred by this data set. 

A tangential velocity measured using perspective motion 
could also be hiding a possible solid-body rotation. An order 
of magnitude estimate of this rotation speed would be v r0 % — 

75 Jv^ + vJ (R e = 445 + 44 pc, D = 77 + 4 kpc). Using the re- 
sults presented in this section, we calculate: v mt ~ 7km s with 
entire data set, and v ml ~ 4km s -1 when the velocity offset popula- 
tion is removed, and when both secondary populations are removed 
or when all stars near the secondary populations are removed. The 
rotation speeds are all comparable but in each estimate the rotation 
is about a different axis. The summary of our results from this sec- 
tion is that a larger data set is required to simultaneously constrain 
properties of the secondary populations and rotation or proper mo- 
tion. The results of our three-component analysis suggest that the 
data prefer the presence of both secondary objects to perspective 
motion (or a rotation that masquerades as it). 



4 DISCUSSION 

K03 utilized a frequentist likelihood test with a two component 
kinematic model (Ursa Minor dSph plus a secondary population) 
similar to our Bayesian object detection method. They discovered 
a stellar clump with a high likelihood ratio (~ 10 4 ) located at 
(10' ,4') (on-sky frame) relative to the Ursa Minor center with pa- 
rameters, cr = 0.5 kms~', Vj = -1 kms~' and clump fraction 
of 0.7 (fraction of stars in the second population). The kinemati- 
cally cold object found with our Bayesian object detection method 
is centered at (10.8' ± 1.8, 5.5' ±0.9) (on-sky frame relative to Ursa 
Minor center), has a size of 6.7' ± 0.5, with kinematic properties 
cr = 4.25 ± 0.75 km s~\ and Av = -1.1+}| S km s -1 . The difference 
between our results and those of K03 lie in the velocity disper- 



sion of the cold object. We have considerably more stars (in total 
roughly 212 to 134 of K03) and are therefore able to infer the dis- 
persion with much greater confidence. We find the mean value for 
the velocity dispersion t o be close to 4 km s" 1 , similar to the dis- 
persion of Segue 1 dSph dSimonetal]201ll) . 

The main uncertainty in our estimates of the dispersion for 
cold and velocity offset objects is the presence of perspective mo- 
tion or solid-body rotation. Perspective motion by itself cannot 
explain these secondary populations. A three-component analysis 
(i.e., main Ursa Minor population and both secondary populations) 
with the coordinates of the centers fixed to within 0. 1 kpc and in- 
cluding perspective motion (with unconstrained tangential veloc- 
ity) prefers the presence of both the secondary populations. In this 
analysis, the velocity dispersions of the cold or velocity offset ob- 
jects are not significantly different from the values obtained without 
including perspective motion. 

To estimate the luminosity of the secondary objects, we use 
the total membership of the objects with the assumption that the 
stars were drawn uniformly from the three distributions of Ursa 
Minor. We find the luminosity of the cold and velocity offset ob- 
jects to be 4 x 10 4 L Q and 6 x 10 4 L Q . The luminosity of the K03 
object is 1.5 X 10 4 L Q , and given the uncertainties we would chalk 
this down as agreement between the two analyses. The dynami- 
cal mass within half-light radius of dispersion supported systems 
can be estimated to about 20% accuracy us ing the line-of-sight 
velocity dispersi ons and the half-light radius ^Walker et alj|2009a ; 
IWolf et alj2O10h . Assuming that the ratio of ri/i/r,^/ of the objects 
is the same as that of Ursa Minor, we find M I/2 = 6 x 10 5 M G , and 
M i/2 = 5 x 10 6 M Q for the cold and velocity offset object. From this 
MlUruz) ~ 30 Mq/Lq and M/L(r l/2 ) * 175 M e /L e for the cold 
and velocity offset objects. If we use this same estimator to find the 
velocity dispersions assuming the objects are relaxed systems with 
only stellar components and M/L = 2 (as in K03), we estimate a 
velocity dispersion of cr = 1.0 km s -1 for both the cold and veloc- 
ity offset objects. This differs from the velocity dispersion found 
through our object detection method by 4 cr and 6.6 cr for the cold 
and velocity offset objects, respectively. Note that the estimator for 
M i/2 assumes that the system is dynamical equilibrium, which may 
not be the case here. If our current results hold up with the addition 
of more data, then either these objects have highly inflated velocity 
dispersions due to the influence of motion in binary stellar systems 
or tidal disruption, or these objects really do have a much larger 
mass than inferred from their luminosities. In the latter case, we 
would have found a satellite of Ursa Minor, the first detection of a 
satellite of a satellite galaxy. We discuss each of these possibilities 
briefly below. 

Contribution of binary orbital motion to the line-of- 
sight veloci ties can inflate the observe d line-of-sight velocities 
of stars |A aronson & Olszews ki! 1 19871; lH argreaves et al. 1996; 
Olszewski et all 1 19961 ; iMinor et alj l20ld ; iMcConnachie & C6ti 
20ld) . A galaxy with a lower intrinsic velocity dispersion has a 
higher chance of having its observed dispersion inflated. A dSph 
with a velocity dispersion between 4 a nd 10 km s -1 is hig hly un- 
likely to be inflated by more t han 30% (Minor et al. 2010) ( for an 
applic ation of this method see lSimon et alj d201 ll) ; iMartinez et al.l 
(2011)). The objects we have found have observed velocity dis- 
persions in this range. Assuming both objects are inflated by 
30%, their actual intrinsic velocity dispersion would be between 
2.5-3.3 km s and 7.1 km s respectively, for the cold and veloc- 
ity offset objects. These velocity dispersions are still much higher 
than 1 km s _1 (that is expected for a relaxed stellar system, i.e. a 
globular cluster). It is unlikely that binary orbital motion alone can 
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account for the large velocity dispersions inferred from this data 
set for both secondary populations. With multi-epoch data, we will 
be able test this hypo thesis directly as was done for Segue 1 dSph 
dMartinez et alj|201lh - 

To assess the effect of tidal disruption from Ursa Minor we cal- 
culate the Jacobi Radius, o, and compare r } to the mean tidal radius 
estimated from our three-component an alysis. To calculate the Ja- 
cobi radius, we consider both an NFW ^Navarro et al.ll 1997b and a 
pseudo-isothermal (cored) profile for the halo of Ursa Minor. To set 
the NFW density profile of Ursa Minor, we pick NFW scale radius 
r s = 1 kpc and estimate the density normalization p s using Mi 12 
values from lWolf etal] J2010h for a NFW profile. We find that if the 
actual distance of the center of the objects is equal to the projected 
distance from the center of Ursa Minor, then rj < r t . If the objects 
are further than about 1 kpc away, then rj > r, with the NFW pro- 
file. The situation for a pseudo-isothermal profile (l/(r 2 + r£j) with 
ro = 300 pc is similar, with rj > r, if the objects are further than 
about 1-2 kpc from the center of Ursa Minor. The rj estimates in- 
dicate that tides from Ursa Minor could have an effect on these ob- 
jects even if they are protected by their own dark matter halos. The 
survival of globular cluster sized objects in dSphs ha s far-reaching 
implications for the dens i ty profile of the ho st halo ( Klevn a et al.l 
20031; iGoerdt et alj|2006t IStrigari et al]|2004 ICowsik et alj|2009l : 
Lora et al.ll2012h . The objects we find are more extended and mas- 
sive than the globular cluster sized objects considered in such work 
in the past. Thus these constraints will have to re-evaluated. 

Generically, the estimated high dispersions of these objects 
and their survival are facts at odds with each other. The age of 
Ursa Minor (~ 12 Gyr) is much longer than the crossing time for 
stars inside Ursa Minor of ~ 150 Myr (assuming a typical veloc- 
ity of 10km s _1 ). The crossing times for the stars in the cold and 
velocity offset object are ~ 50 Myr. These objects have had time 
to make multiple orbits around Ursa Minor and it is hard to see 
how they could have survived given the short crossing times unless 
they have been recently captured by Ursa Minor and are now the 
process of tidal disrupted (which would account for the inflated ve- 
locity dispersion). However, this is not a likely scenario since Ursa 
Minor probably fe ll into the Milky Way early, between 8-11 Gyr 
l lRochaetal1l201lh . and capturing a large object after that is un- 
likely. It is more reasonable to assume that these objects have sur- 
vived for long because they were protected by a dark matter halo of 
their own. The reality is probably more complicated: these objects 
may have their own dark matter halos and at the same time are be- 
ing tidally disrupted. These implications are intimately tied to the 
dark matter halo of Ursa Minor and pinning down the properties of 
these objects would help to decipher if the dark matter halo of Ursa 
Minor has a cusp or a core. 



5 CONCLUSION 

We have presented a method for finding multiple localized 
kinematically-distinct populations (stellar substructure) in line-of- 
sight velocity data. In the the nearby dwarf spheroidal galaxy Ursa 
Minor, we have found two secondary populations: "cold" and "ve- 
locity offset (vo)" objects. The estimated velocity dispersions are 
°~coid = 4.25 ±0.75 km s _I and cr vo = 9.25 ± 1.25km s -1 , and the es- 
timated mean velocities are V co u = -246.25 ±1.0 km s _1 and v vo = 
-258.0 ± 1.5 km s~\ They are located at (0.25+°^, -0.07^$) kpc 
(cold object) and (-0.24 ± 0.09, 0.23 ± 0.02) kpc (velocity offset 
object) with respect to the center of Ursa M inor. The location o f 
the cold object matches that found earlier bv lKlevna et"aU J2003h . 



but our results reveal that the velocity dispersion of this cold ob- 
ject could be large with a mean value close to 4 km s , To assess 
the significance of our detections, we employed the Bayes Factor 
and information criteria D KL and DIC supplemented with the anal- 
ysis of mock data sets with secondary populations, null hypothesis 
mock data sets and scrambled data sets. The two secondary objects 
have > 98.5% C.L. in all the model selection tests employed. 

If the velocity dispersions are as large as our Bayesian analy- 
sis seems to indicate, then these objects are likely undergoing tidal 
disruption or are embedded in a dark matter halo. The two pos- 
sibilities are not exclusive of each other. If these objects are dark 
matter dominated, this would be the first detection of a satellite of 
a satellite galaxy. 

As emphasized bv lKlevna et alj J2003h the presence of local- 
ized substructure has important implications for inner density pro- 
file of the dark matter halo of Ursa Minor. The shape of the inner 
profile (cusp or core) has important implications for the properties 
of the dark matter particle with cold dark matter model predicting 
a cuspy inner density profile. If the stellar substructure is hosted by 
its own dark matter halo, then it has further implications for dark 
matter models since this would likely be the smallest bound dark 
matter structure discovered. 
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